|
Katz back-off is a generative ''n''-gram language model that estimates the conditional probability of a word given its history in the ''n''-gram. It accomplishes this estimation by "backing-off" to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results. ==The method== The equation for Katz's back-off model is: 〔Katz, S. M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400–401. 〕 : where : ''C''(''x'') = number of times ''x'' appears in training : ''w''''i'' = ''i''th word in the given context Essentially, this means that if the ''n''-gram has been seen more than ''k'' times in training, the conditional probability of a word given its history is proportional to the maximum likelihood estimate of that ''n''-gram. Otherwise, the conditional probability is equal to the back-off conditional probability of the "(''n'' − 1)-gram". The more difficult part is determining the values for ''k'', ''d'' and ''α''. is the least important of the parameters. It is usually chosen to be 0. However, empirical testing may find better values for k. is typically the amount of discounting found by Good–Turing estimation. In other words, if Good–Turing estimates as , then To compute , it is useful to first define a quantity β, which is the left-over probability mass for the (''n'' − 1)-gram: : Then the back-off weight, α, is computed as follows: : The above formula only applies if there is data for the "(''n'' − 1)-gram". If not, algorithm skips N-1 entirely and uses the Katz estimate for N-2. (and so on until an N-gram with data is found) 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Katz's back-off model」の詳細全文を読む スポンサード リンク
|